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ABSTRACT 

We report Gemini-South GMOS observations of the exoplanet system WASP-29 
during primary transit as a test case for differential spectrophotometry. We use the 
multi-object spectrograph to observe the target star and a comparison star simul- 
taneously to produce multiple light curves at varying wavelengths. The 'white' light 
curve and fifteen 'spectral' light curves are analysed to refine the system parameters 
and produce a transmission spectrum from ~515 to 720 nm. All light curves exhibit 
time-correlated noise, which we model using a variety of techniques. These include a 
simple noise rescaling, a Gaussian process model, and a wavelet based method. These 
methods all produce consistent results, although with different uncertainties. The pre- 
cision of the transmission spectrum is improved by subtracting a common signal from 
all the spectral light curves, reaching a typical precision of ~lxlO -4 in transit depth. 
The transmission spectrum is free of spectral features, and given the non-detection of 
a pressure broadened Na feature, we can rule out the presence of a Na rich atmosphere 
free of clouds or hazes, although we cannot rule out a narrow Na core. This indicates 
that Na is not present in the atmosphere, and/or that clouds/hazes play a signifi- 
cant role in the atmosphere and mask the broad wings of the Na feature, although 
the former is a more likely explanation given WASP-29b's equilibrium temperature of 
~970K, at which Na can form various compounds. We also briefly discuss the use of 
Gaussian process and wavelet methods to account for time correlated noise in transit 
light curves. 

Key words: methods: data analysis, stars: individual (WASP-29), planetary systems, 
techniques: spectroscopic, techniques: Gaussian processes 



1 INTRODUCTION 

The study of transiting exoplanets is rapidly advancing our 
understanding of planets beyond our own Solar system. 
Planets' bulk densities are obtained via measurement of the 
radius and mass using transits and radial velocity measure- 
ments; this is the first step in understanding their structure 
and composition. However, to understand planetary systems 
more fully and explore their diversity we need spectroscopic 
measurements of their atmospheres; luckily, transiting plan- 
ets allow such measurements without requiring the star and 
planet to be spatially resolved. 



E-mail: ngibson@eso.org 



Transmission spectroscopy is a measurement of the ef- 
fective size of the planet as a function of wavelength during 
primary transit. Due to wavelength dependent opacities in 
the atmosphere, the planet appears larger at wavelengths 
where the atmosphere absorbs or scatters light. We can 
therefore probe for the presence of atomic and molecular 
species, as well as clouds or hazes (Seager & Sasselov 2000 



Brown 2001). Until recently, transmission spectroscopy has 
only been feasible using space-based telescopes, which have 
been tremendously successful, particularly at optical wave- 
lengths (e.g |Charbonneau et al |2002||Pont et al.|2008||Sing 



et al.|2008||2011||Huitson et al.|2012| ). Measurements in the 
NIR have proved more controversial, with different groups 
reporting conflicting conclusions from the same dataset (e.g. 
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Swain et al.|20 Q8; Gibs on et al.|2011 ); however, advances in 
data analysis techniques are starting to resolve the issue 
(e.g. |Gibson et aL|2012b| |Waldmann||2012| ), along with the 
availability of more stable NIR cameras such as WFC3 (e.g. 
Berta et al.|2012| [Gibson et al.|2012a ). 

Most recently, the use of a multi-object spectro- 
graph (MOS) to perform differential spectro-photometry has 
started to show that ground based observations can also play 
an important role. This technique was pioneered by Bean 
et al. (2010), using the VLT to produce a transmission spec- 



trum of the super-Earth GJ-1214b. Such observations have 
benefitted not only from advances in observing strategy, but 
also the availability of new bright targets with nearby com- 
parison stars. Similar observations have since been made in 
the near- infrared ( Bean et al.|2011 ) and with long-slit spec- 
trographs ( |Sing et al.||2012 ). Here we report Gemini obser- 
vations of WAS P-29 using this MOS technique. 

WASP-29b ( |Hellier et aT1|2Q10j) is a Saturn-size planet 
with a mass and radius of 0.24 ±0.02 Mj and 0.79±0.05i?j. 
It orbits a K4 dwarf with a period of ^3.9 days. Given its 
relatively low equilibrium temperature of ^980 K, it does 
not have a particularly large scale height compared to the 
hottest and lowest-density hot Jupiters. It was in fact cho- 
sen as a test case for the Gemini GMOS instrument, to test 
its stability before more favourable targets are observed. 
WASP-29 has a nearby, bright comparison star, making it 
ideal for differential photometry or spectroscopy. 

Despite its recent successes, MOS differential spec- 
troscopy, like other techniques, is expected to suffer from 
correlated noise originating from poorly understood instru- 
mental effects. Although the differential nature of the mea- 
surements is intended to reduce this problem, systematic 
effects are not yet fully understood for ground based differ- 
ential photometry, and the extra complexity in spectroscopic 
work may introduce more systematic effects. In this paper, 
we explore the use of several methods to analyse transit light 
curves to obtain useful measurements in the presence of sig- 
nificant correlated noise. These methods include the wavelet 



method of Carter &; Winn (2009), and the Gaussian process 
(GP) model of Gibson et al. ( |2012b| ) adapted to model time- 
correlated noise. 

This paper is structured as follows: Sect. [2] presents our 
observations and data reduction procedure, in Sect. [3] we 
present our transit analysis methods, and finally in Sects. ^] 
and [5] we present our results and conclusions. 



2 GMOS OBSERVATIONS AND DATA 
REDUCTION 

A transit of WASP-29 was observed using the 8-m Gemini- 
South telescope with the Gemini Multi-Object Spectrograph 
(GMOS) on October 19 2011. Data were taken as part of 
program GS-2011B-Q-13 (PI. Gibson). GMOS has an imag- 
ing field-of-view of 5.5x5.5 arcmin squared, and consists of 
three 2048x4608 pixel CCDs arranged side by side with 
small gaps in-between. We used GMOS in multi-object mode 
to observe the target plus two comparison stars simultane- 
ously and continuously for ~5.1 hours, covering the 2.66 
hour transit plus 1.5 hours prior to ingress and 1.0 hours 
after egress. Conditions were photometric for the duration 
of the observations. 



Observations used the R400 grism + OG515 filter with a 
central wavelength of 725 nm. The dispersion is 0.07 nm per 
(unbinned) pixel, giving wavelength coverage from ~515- 
940 nm. Exposure times were 30.5 seconds and 2x2 binning 
was used. The full frame readout of the GMOS chips in 2x2 
binning is ^55 seconds (in the recommended 'slow' read- 
out). In order to reduce the overheads we read out only 
three regions of interest (ROI) on the chip containing the 
three stellar spectra, resulting in a readout time of ^22 sec- 
onds, and therefore a cadence of ^53 seconds. This allowed 
348 exposures over the 5.1 hours of observation. To minimise 
slit losses we created a mask with slits of 30 arcsec length 
and 10 arcsec width for the three stars, giving seeing limited 
(therefore variable) resolution ranging from R~430-860 at 
700 nm. Calibrations were taken before and after the science 
observations, and consisted of flat fields and arc lamp expo- 
sures. Further arcs were taken with a calibration mask. This 
was almost identical to the science mask but with narrow 
slits of 1 arcsec to enable more precise wavelength calibra- 
tion. 

The data were reduced using the standard GMOS 
pipeline contained in the Gemini IRAfQ/PyRAF^] package. 
First the ROI images were processed to be in the standard 
GMOS format. Basic reductions included bias subtraction 
and wavelength calibration. Fringing is particularly large at 
the red end of the spectrgj^] (>750nm). We tried correct- 
ing for this using the flat fields, however this proved par- 
ticularly problematic given that our spectral flat fields were 
taken with slit widths much larger than the PSF of the star. 
Flat fields taken with the calibration mask were corrupted. 
Given this, we decided not to apply flat-fielding. In differen- 
tial spectrophotometry, flat fielding is not necessary provid- 
ing the sensitivity ratios between the target and comparison 
stars in each wavelength channel remain constant. Of course, 
this is often not the case if there is movement of the spectra 
on the CCDs, seeing variations, etc. However, correcting all 
the data using the same flat field cannot account for such 
effects anyway (although it might mitigate against them), 
given spectral flat-fields are PSF (at least when using wide 
slits) and wavelength dependent. Simple flat-fielding is also 
unable to correct for severe fringing at the level required for 
transmission spectroscopy. 

Wavelength calibrated spectra for the 3 stars were ex- 
tracted using the GSEXTRACT routine, with an aperture of 
8 pixels (varying the aperture by a few pixels had little ef- 
fect) after sky subtraction. A few pixel columns (the spa- 
tial direction) showing significant temporal variation were 
masked from the extraction. Examples of extracted spec- 
tra of WASP-29 and the two comparison stars are shown 
in Fig. [l] showing uncorrected fringing effects at the red 
end. The spectra were divided up into several spectral re- 
gions, defining independent wavelength channels. To extract 
the light curves for each wavelength channel, the spectra 



1 IRAF is distributed by the National Optical Astronomy Ob- 
servatory, which is operated by the Association of Universities 
for Research in Astronomy (AURA) under cooperative agreement 
with the National Science Foundation 

2 PyRAF is a product of the Space Telescope Science Institute, 
which is operated by AURA for NASA 

3 Fringing is not so problematic using the GMOS-North detec- 
tors, where future observations are planned. 
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were summed in each of the spectral regions to produce a 
flux time-series of the target and comparison stars for each 
channel. The target flux was then divided through by the 
sum of the comparison stars' flux to produce multiple light 
curves at each wavelength channel, which we hereafter re- 
fer to as the 'spectral' light curves. We experimented with 
different numbers of wavelength channels, which affect the 
resolution and signal to noise, and finally settled on 30 across 
the whole spectral range, which are marked by the vertical 
dashed lines in Fig. [I] We divided each of the three chips 
into 10 channels each of the same pixel widths, ignoring the 
gaps between detectors in the images. 

Correlated noise is present to some degree in all the 
light curves, but is particularly bad at the red end due to 
the fringing (and possibly the strong O2 telluric feature at 
^7590 A). We therefore decided to analyse only 15 spectral 
light curves at the blue end and discard the remaining 15 for 
the remainder of this analysis. We also experimented with 
the two comparison stars. Given that the second comparison 
star is significantly fainter, it was excluded from the analysis 
and we only used the brighter one. Finally, we also produced 
a 'white' light curve, by summing up the flux over the first 
10 wavelength channels (the other 5 contained larger sys- 
tematics), prior to dividing through the target flux by the 
single comparison star's flux. The white light curve is shown 
in Fig. [2] and the spectral light curves are shown in Fig. [3] 

We calculated the theoretical photon noise for the white 
light curve and each spectral bin, taking into account the 
read noise and sky contribution (although negligible for high 
signal-to-noise data). Typical integrated electron counts per 
exposure for each spectral bin range from ^3.7 xlO 6 to 
2.1 xlO 7 for the target star, and ~7.9xl0 6 to 2.6xl0 7 for 
the comparison star, and vary slightly throughout the night 
with airmass. For the white light curve the typical integrated 
counts per exposure were ^1.3xl0 8 and 2.0xl0 8 electrons 
for the target and comparison star, respectively. This re- 
sults in (time-averaged) theoretical precision on the relative 
flux per exposure ranging from 2.9xl0 -4 to 6.2xl0 -4 for 
the spectral light curves, and ~1.1 xlO -4 for the white light 
curve. 

We also extracted auxiliary data from our target and 
comparison star's spectra to investigate the cause of the sys- 
tematics present in the light curves. The relative shift in the 
dispersion axis was measured by cross correlating each star's 
spectra with the first in the time-series. The relative shift in 
the cross-dispersion axis and the width of the spectral trace 
were found by fitting a Gaussian function to each column of 
the spectrum, and finding the average values per exposure. 
Over the course of the observations, the shift in the disper- 
sion and cross-dispersion axes were ~1.0 and 0.5 (binned) 
pixels respectively, and were the same for both stars (within 
the measurement error of ~ 0.1 pixels), showing that no sig- 
nificant rotation of the field occurred. The FWHM in the 
cross dispersion direction ranged from 6-12 (binned) pixels, 
resulting in varying spectral resolution. We found no obvious 
correlations between these measurements and the systemat- 
ics in the light curves, although the seeing variations are 
likely the most significant contributor given the relatively 
large changes. 
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Figure 1. Example spectra extracted from a single exposure. The 
black line is WASP-29, and the blue and green lines are the two 
comparison stars. The vertical dashed lines mark the extraction 
regions, with the shaded regions marking the gaps in the target 
spectrum between the detectors. Only the first 15 channels were 
used in the final analysis, due to significant fringing at the red 
end. 
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Figure 2. The 'white' light curve produced from the first 10 
wavelength channels of the GMOS spectra. Top: best fit model 
using a white noise analysis and the residuals. Clearly, significant 
systematics are present and must be accounted for in the fitting 
process. Bottom: best fit using a Gaussian process model with a 
Matern 3/2 covariance kernel (see text). The red line represents 
the best fit model, with the grey regions representing the 1 and 2 a 
limits of the instrument model plus white noise times the transit 
model. The dashed red line is the projection of the systematics 
model without the transit, along with the 1 and 2 a limits (now 
excluding the white noise term). 



3 LIGHT CURVE ANALYSIS 

The white light curve and the 15 spectral light curves were 
modelled in a variety of ways to account for the instrumen- 
tal systematics. We first analysed them using a simple white 
noise model to inspect the residuals and establish the na- 
ture of the systematics. In all cases the transit model was 
constructed using the analytic equations of Mandel & Agol 
(|2002|), and is similar to that described in Gibson et al. 



(2008). We assumed a circular orbit, and fixed the period 
(P) to 3.922727 days as given in|Hellier etH] ([2010]) . The 



remaining parameters of the transit model were the central 
transit time (Tc), the system scale (a/R*), the planet-to- 
star radius ratio (p = Rp/R*), the impact parameter (6), the 
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two quadratic limb darkening parameters (ci,C2), and two 
parameters of a linear baseline model of time (/ 00 t, ^grad)- 
We hereafter denote the transit model as T(t, </>), where t is 
a vector of time, <p is the vector of transit parameters, and 
/ as the vector of flux measurements. 

Unless otherwise stated, all of these parameters except 
P were fitted to the white light curve, and p, ci, C2, /oot 
and Tg r ad were fitted to the spectral light curves with the 
remaining parameters fixed to the final white light curve 
values (see Sect. 4.1). This is because we are interested in 



finding the relative values for p as a function of wavelength 
to produce our transmission spectrum, and we can condition 
on the values of Tc, a/R* and 6, that change the inferred 
values for p in the same way. 

Given the lack of correlations found between the sys- 
tematics and the auxiliary information extracted (see Sect. 
[2|, we model the systematics as time-correlated noise. The 
only differences in the following analyses are the noise mod- 
els used to account for instrumental systematics. These are 
described in turn. 



3.1 White noise analysis 

We first analysed all of the light curves using a simple white 
noise model, with likelihood given by 



p{f\t,<t>)=N{T{t,<t>),al\) 



(1) 



where Af(pL, 5]) is the multivariate normal distribution with 
mean pi and covariance matrix X, cr w is the uncertainty of 
each data point and I is the identity matrix. In other words 
we have a diagonal covariance matrix with all diagonal terms 
equal to cr^, representing a standard i.i.d. noise model as is 
commonly used to model transits. 

We then multiply the likelihood by the prior on the 
transit and noise parameters, p(</>, cr^), to produce the joint 
posterior probability distribution, and use a Monte-Carlo 
Markov-Chain (MCMC) to explore the posterior distribu- 
tion and produce marginal probabilities for each of the 
model parameters. In practice we do not explicitly state pri- 
ors for most parameters, implying uniform, improper priors. 
The exceptions are for the limb darkening parameters and 
impact parametei^] where we restrict the parameter to be 
positive using a step function of the form; 



r 0, if 
1 1, if 



x < 
x > 



specifying another improper prior. We also restrict the sum 
of the two limb darkening parameters c\ + C2 ^ 1 in a similar 
way, to ensure that the brightness of the stellar surface is 
positive. Four MCMC chains were run for each light curve, 
of length 100 000. We excluded the first 10% of each chain, 
and verified convergence by checking the Gelman & Rubin 



(GR) statistic ( |Gelman fe Rubin||1992) . The light curves 
along with their best fit models are shown in Figs. [2] and [3] 
for the white and spectral light curves, respectively. 

Clearly, the white noise model is incapable of account- 
ing for the correlated noise in the light curves, as seen in the 
residuals. We therefore analyse the residuals of each of the 



best fit models in an effort to understand the form of the sys- 
tematics. We first use the time- averaging method to obtain 
a simple estimate of the red nois e (|Pont et al.|2006 ) , follow- 
ing the procedure of | Winn et al. (2008), where the residuals 
are averaged into bins of width A/", and the RMS is calcu- 
lated as a function of N. The no ise should drop by 1 /y/N 
if it is uncorrelated in tim^J See |Gibson et al7| ( |2QQ9| ) for a 
more detailed description of this procedure. Fig.[4]shows an 
example of this for one of the light curves, clearly showing 
that there is time-correlated noise in the light curves. As a 
first attempt to account for correlated noise, we calculate 
the factor f3, which is the ratio of the RMS vs N plot to the 
theoretical noise in the white case. We chose the maximum 
value for this, and then scale the noise parameter, cr w , by 
this value, fix it, and re- fit the light curves using artificially 
inflated error bars to account for the systematics using the 
same MCMC procedure as before. /3 ranged from ^2.4 to 
3.8. We will hereafter refer to this as the 'white noise plus 
/?' model. 

This method is a useful way to estimate the additional 
uncertainties expected in the presence of systematic noise. 
However, it does not allow the form of the correlations to 
be modelled and therefore cannot produce more accurate 



parameter estimates (this is discussed in Carter &; Winn 



2009 ) , and therefore we consider more sophisticated models 



in the following sections. 



3.2 Gaussian process analysis 

We use a Gaussian process to model the time-correlated 



noise, similar to that described in Gibson et al. (2012b) and 



Gibson et al. (2012a), allowing us to model instrumental 
systematics as a stochastic rather than a deterministic pro- 
cess. This avoids the need to specify a parametric form for 
the systematics, which is often impossible to do, whilst also 
allowing for a much more flexible model. Furthermore, GPs 
are intrinsically Bayesian, thus avoiding the possibility of 
over- fitting systematics. The combination of a very flexible 
model and principled Bayesian inference effectively allows 
one to marginalise out any ignorance about the form of the 
systematics model and account for it whilst inferring transit 
parameters. This is extremely challenging to achieve using 
parametric models as it requires calculation of the Bayesian 
evidence (and therefore proper, usually informative, priors 
on the model parameters), and perhaps even marginalisation 
over many possible instrument models. 

Here we briefly describe the GP model, and refer to |Gib-| 
son et al. (2012b| and references therein for further details. 



A GP is a collection of random variables, any finite subset 
of which have a joint Gaussian distribution. Therefore we 
can write our GP as 



p(f\t, <t>,0) = N (T(t, <(>), S(t, 6)) . 



(2) 



The only difference to Eq. [T] is that we now consider the 
off diagonal elements of the covariance matrix. As well as 
specifying a mean function, in this case the transit function, 
with a GP model we must also specify a kernel function 



4 strictly speaking we should apply the same prior to p and a/R+, 
but this would have no effect on the inference. 



actually, yj , where M is the number of bins 



© 2002 RAS, MNRAS 000, [1^14] 



Transmission spectroscopy of WASP-29b 5 




0.60 0.65 0.70 0.75 0.80 0.60 0.65 0.70 0.75 0.80 0.60 0.65 0.70 0.75 0.80 
HJD - 2455853 HJD - 2455853 HJD - 2455853 



Figure 3. Spectral light curves with the central wavelengths marked, with a linear trend in time removed. Left panel: light curves fitted 
with the simple white noise model. Middle panel: residuals from the white noise fits. Right panel: same light curves as left panel with 
their best fit Gaussian process model in red. The grey shading represents the 1 and 2 a limits of the GP model (including white noise). 



which populates the elements of the covariance matrix and 
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has hyperparameter^fl, written as: 

6 A Gaussian process, QV(ii, X), is fully specified by its mean 
\x and covariance S. Parameters of both the mean function and 
kernel are known as hyperparameters. 
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Figure 4. Residuals analysis of one of the spectral light curves 
(6895 A). Top panel: Residuals from the best fit white noise 
model. The dashed lines represent the white noise RMS. Second 
panel: Plot of the RMS as a function of bin size N . The dashed 
line represents the theoretical curve expected for white noise, and 
the ratio of the two curves gives the factor /3, used to re-scale the 
uncertainties for the simple rescaling systematics model. Third 
panel: Autocorrelation function of the residuals. The solid green 
line marks the best fit Matern 3/2 kernel from the GP model, 
and the dashed red line zero correlation. Bottom panel: Power 
spectral density of the residuals. The dashed line marks a 1/f 
envelope. 



We will discuss the choice of the kernel function later. 

The above GP uses the kernel to model the residuals 
from the light curve model. Alternatively, we can model the 
light curve as the transit light curve times a GP, if we wish 
our systematics model to be multiplicative rather than ad- 
ditive, i.e.: 

f = T(t,<p,e)xgv(i,v(t,o))- 

In this case the joint probability distribution can be written 
for f/T(t,4>) as 

P (f/T(t,4,o)\t,4>)=jsr(i,ii(t,o))^ 

In practice it makes little difference which model we choose, 
given that transit light curves are shallow, and we could also 
combine multiplicative and additive GPs. For the remainder 
of this paper we use the latter, multiplicative model. 



We tested several different types of kernels to model 
the time-correlated noise, including the squared exponential, 
Matern and rational quadratic (see Ras mussen &; Williams| 
2006 for a detailed discussion of kernels, which is beyond the 
scope of this paper). In a fully Bayesian analysis, we could 
calculate the Bayesian evidence for each kernel, and use it to 
choose the best kernel, or alternatively even marginalise over 
them. However, not only is this computationally prohibitive, 
but it would also require us to specify proper (therefore 
informative) priors on the hyperparameters, which would 
make the evidence somewhat subjective. We therefore se- 
lected a kernel based on analysis of the white noise model 
residuals, and by running tests on simulated light curves. 

In the end we decided to use a Matern 3/2 kernel func- 
tion, given by: 

k(t n ,t m \0) = f (l + V3r]At) exp (-v^tyAt) +5 nm al, (3) 

where £ is a hyperparameter that specifies the maximum co- 
variance, At = \t n — t m | is the time difference, 77 is the inverse 
characteristic length scale, and 5 is the Kronecker delta. This 
kernel can be seen as a rougher version of the commonly used 
squared exponential kernel (i.e. that used in Gib son et al.| 
2012b|a ). One major motivation for this kernel is that is 
gives the best match to the auto-correlation function (ACF) 
for most of the residuals from the white noise model. This 
provides an estimate of how the data points are correlated 
with one another as a function of time lag, and an exam- 
ple is shown in Fig. [4] The green line in the ACF function 
marks the best fit covariance kernel (although in practice 
we marginalise over the kernel parameters). We also ran a 
series of tests on simulated light curves, described in the Ap- 
pendix, which further validate the use of the Matern kernel. 
We stress that selecting a kernel, whilst in some ways anal- 
ogous to selecting a parametric model, allows a much more 
flexible model than any parametric form and is also intrinsi- 
cally Bayesian. In addition, we ran much of the same analysis 
using the squared exponential kernel. This gave similar re- 
sults, and we therefore conclude that the choice of kernel is 
not critical for this particular dataset. 

Analogously to the white noise case, we can specify pri- 
ors for all the hyperparameters of the model, and multiply by 
the marginal likelihoocQto produce the posterior joint prob- 
ability distribution. This can then be optimised with respect 
to the hyperparameters using a Nelder-Mead simplex algo- 
rithm, or alternatively the marginal parameter distributions 
for each hyperparameter can be obtained by exploring the 
posterior distribution with an MCMC in just the same way 
as for the white noise model. The same priors were applied to 
the limb darkening parameters and 6, and we also specified 
hyperpriors for the hyperparameters £ and 77. These took the 
form of Gamma distributions with shape parameter unity, 
given by 



if x < 
exp (—x/l) , if x ^ 



where / is the length scale of the hyperprior. We set the 



7 called the marginal likelihood because in a GP we have 
marginalised over all the possible functions for each set of hy- 
perparameters 
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length scales for £ and rj as 10 ~ 3 and 200, respectively. These 
were not specified to influence the results of the inference, 
but rather to ease convergence of the MCMC chains (when 
both parameters are small they are unconstrained by the 
likelihood). Indeed, we checked that the length scales of the 
hyperpriors did not affect the inferred transit parameters. 

Whilst GPs are rather simple in theory, each evaluation 
of the marginal likelihood requires inversion of the covari- 
ance matrix, which makes a full marginalisation tedious as 
it requires 0(n 3 ) computations, and limits full GP analyses 
to relatively small datasets. Therefore in addition to the full 
GP marginalisation, we also use a technique called Maxi- 
mum Likelihood type-II (ML-II), where the hyperparame- 
ters of the covariance kernel are fixed to their maximum- 
posterior values, and we marginalise over the remaining 
transit parameters. Once the covariance hyperparameters 
are fixed this negates the need to invert the covariance ma- 
trix. This is a valid approximation when the posterior is 
sharply peaked with respect to the covariance hyperparam- 
eters, and is particularly useful when running many tests on 
the data, although for our final results we always use fully 
marginalised GPs (i.e. we marginalised over the covariance 
hyperparameters as well). 

We ran four chains for all light curves of length 100 000 
and 50 000 for the ML-II and full marginalisations, respec- 
tively. Each chain of length 50 000 for the full marginalisa- 
tion took about 17.5 minutes to compute using a single core 
on a standard desktop, compared to about 2 minutes and 
1 minute for 100 000 length chains with the ML-II method 
and white noise model, repectively. The run times do not 
scale as badly as one might expect with G(n 3 ) complexity, 
because the computation of the light curve model rather 
than the likelihood dominates for the simpler noise models. 
We tested for convergence in the same way as before. Due 
to degeneracy in the linear baseline and the GP model, we 
decided to fix the T gra d parameter for the full GP marginal- 
isation. This requires shorter chains to reach convergence, 
and doesn't effect the results as p did not significantly cor- 
relate with Tg r ad- We verified this with longer chains for a 
subset of the light curves. In a few cases, £ and/or r\ did 
not fully converge (the GR statistic was a few percent from 
unity), but only when one or both parameters were consis- 
tent with zero; however, in all cases p converged and there- 
fore the transmission spectrum is not affected. The best fit 
GP models to the light curves are shown in Figs. [2] and [3] 
To illustrate correlations in the parameters and convergence 
of the final marginal distributions, ID and 2D marginal dis- 
tributions are shown in Fig. [5] for the white light curve and 
one spectral light curve. 



3.3 Wavelet analysis 

As we have used a GP to fit for time-correlated noise only, it 
is worth exploring other methods in the literature to account 
for time-correlated noise. One such method is the wavelet 
method introduced by Carter &; Winn| ( |2009 ). It is valid 
when the power spectral density (PSD) of the noise takes 
the form: 



PSD oc 



where / is the frequency and 7 is the spectral index. The 
method is based on taking a wavelet transform of the resid- 
uals from the best fit model, and the likelihood is computed 
from the wavelet coefficients. The central idea of this method 
is that it diagonalises the covariance matrix of Eq. [2] when 
specified in the wavelet domain. Like GPs it models the sys- 
tematics as a stochastic process; however, it has the signif- 
icant advantage that a burdensome matrix inversion is not 
required for each evaluation of the likelihood and is therefore 
much faster. The extra computation required as compared 
to the white noise analysis is a fast wavelet transform, which 
is of order 0(n), and therefore does not significantly extend 
the computation time. 

In order to justify the wavelet method we analysed the 
PSD of all the residuals from the white noise fit as recom- 
mended by |Carter &; Winn ( |2009| . An example of this is 
shown in Fig. [4] In most cases the noise appeared consistent 
with l// 7 , by which we mean that low frequency compo- 
nents dominate, but the shape itself is hard to determine 
(the PSD should appear as noise within a l// 7 envelope). 
This supports the use of the wavelet based likelihood for 
our light curves, or at least suggests the noise properties are 
nearly 1/ f 1 . 

Our likelihood therefore took the form of Eq. 32 from 
|Carter &; Winn| ( |2009| ), and we multiplied by priors similarly 
to Sect. |3.l| to produce a posterior distribution. In addition 
to the white noise parameter cr w , the wavelet likelihood has 
a red noise parameter a r specifying the amplitude of the red 
noise component, and the spectral index 7. We fixed 7 to 1 
(as in |Carter &; Winn|2009| , and optimised and explored the 
joint posterior distribution with respect to the free transit 
parameters, plus cr w and a r . Chains lengths and run times 
were approximately the same as for the white noise analysis. 

3.4 Removal of common mode systematics 

Finally, we experimented with a simple method to remove 
common signals observed in the spectral light curves, to see 
if we could increase the precision in our transmission spec- 
trum. We were motivated to do this as we obtain a reduced 
X 2 significantly lower than one for our transmission spec- 
trum using all of the noise models (see Sect. 4.3). This is 
extremely unlikely to happen by chance, and is probably 
the result of similar systematic signals in the spectral light 
curves. Noise models will take all signals into account when 
calculating the uncertainties in p (as they should); however, 
common signals will increase the uncertainties for each point 
in the transmission spectrum. As we are trying to find the 
relative change in p with transmission spectroscopy (hence 
why we condition on fixed values of To, a/R* and 6), we 
tried to remove a common signal in all the light curves. 

This common signal is evident in Fig. [3] where many of 
the residuals appear to have the same shape. This signal is 
also very similar to the GP model fitted to the white light 
curve (red dashed line, Fig. [2|. Taking this signal into ac- 
count with the GP (or any other method) will increase the 
uncertainties in the calculated values of p, but this signal 
should not affect the relative values for p. We therefore di- 
vided through each spectral light curve by the GP system- 
atics model fitted to the white light curve prior to fitting 
each light curve with the methods described above. This 
will remove the common signal and allow the noise models 
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Figure 5. ID and 2D marginal distributions from the posterior probability distribution for the Gaussian process noise model. The lower 
left shows the probability distributions for the white light curve, and the upper right for one of the spectral light curves. The black lines 
in the 2D distributions mark the 1 and 2cr limits, respectively. Distributions for the four separate MCMC chains are shown in the ID 
histograms. 



to produce more independent data points. Of course the un- 
derlying physical signal may not be identical for all the light 
curves, but the noise models described above can also take 
into account any excess signal added or not removed by the 
procedure in the same way that they account for 'normal' 
systematics. In theory we could model this in more princi- 
pled ways, e.g. by modelling all light curves simultaneously 
with a common signal plus independent signals, or perhaps 
by using the white light curve instrument model as an input 
for the systematics model, but we choose not to pursue them 
here and follow this simple procedure. Each fit for the spec- 



tral light curves as described in Sects. |3J"| |3.2| and |3.3| was 

repeated by first dividing the light curve by the GP noise 
model for the white light curve. This procedure appeared 
to remove much of the common signal, and allowed for a 
more precise determination of the transmission spectrum, 
as described in the following section. 
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4 RESULTS AND CONCLUSIONS 
4.1 White light curve analysis 

The results from the white light curve fit are given in Tab.[l] 
for three of the noise models used: white noise plus f3 rescale, 
wavelets, and the Gaussian process after marginalising over 
all the transit parameters and covariance hyperparameters. 
The derived parameters are consistent with those reported 



Hellier et al. (2010) and Dragomir et al. (2011). In gen- 



eral, the three noise models gave consistent results. The sim- 
ple white noise model produced the smallest uncertainties 
(which we do not reproduce here), whereas the white noise 
plus /3 model gave the largest uncertainties. The wavelet 
and GP models both gave something in between. This is 
expected given that they both try to account for the form 
of the systematics model when inferring transit parameters, 
rather than just scaling the uncertainties to account for it. 
The fact that the wavelet and GP methods are not only 
consistent, but produce similar uncertainties is a strong val- 
idation of both techniques. We briefly discuss the relative 
merits of both methods in Sect. [5] and for the remainder of 
this paper we adopt the GP results, given that we cannot 
verify the 1/f nature of the noise, and they provide more 
conservative uncertainties. This maybe indicates that we are 
taking into account a larger range of possible systematic sig- 
nals. The white noise values fitted for the GP and wavelet 
methods are ^3.8 and 2.9 times the theoretical photon noise. 
The difference is perhaps due to the wavelet method absorb- 
ing some of the white noise into the systematic component 
(see e.g. Figs. 1 and 4 of Carter &; Winn|[2009[ where the 
1/f noise appears to contain a white component). Further 
sources of noise are accounted for by the time-correlated 
component of these models, given by the max covariance 
and red noise parameters quoted in Tab. ^ 

Using the distributions from the MCMC chains, we cal- 
culate further system parameters of WASP-29b. These are 
given in Tab. [2] Where the distributions were not avail- 
able, we generated normal distributed values from the val- 
ues given in the literature to propagate the uncertainties 
in the calculations properly. Again these results are con- 



sistent with the values reported in Hellier et al. (2010) and 
Dragomir et al. ( |2011| within the uncertainties, further con- 
firming WASP-29b's status as a Saturn-like exoplanet. 



4.2 Transit Ephemeris 

A new ephemeris was calculated for WASP-29 using the 
transit time derived for the white Gemini transit light curve, 



plus the ephemeris reported in Hellier et al. (2010) and the 
transit time from Dragomir et al.|fl2011| ). These were con- 
verted to HJD utc format, and a straight line of the form 

T C (E) =T C (0) + PE 

was fitted to the three transit times. The zero-point epoch 
was set equal to as near to the centre of mass of the three 
points as possible, weigthed as 1/c>t c . This was to minimise 
the covariance between the transit epoch and the period, 
which was verified after the fit. The chosen E — transit 
was 6 periods prior to the Gemini transit, giving epochs of 
-130, -98 and 6 for the three transits. The new ephemeris is 
reported in Tab. [2] 



Table 2. WASP-29 parameters derived from the MCMC poste- 
rior distribution of the GP fits. 



Parameter 



Value 



Unit 



Transit epoch, To 
Period, P 

Transit duration, T14 
Inclination, i 
Semi-major axis, a 
Stellar radius^, R* 
Planet mass" , M p 
Planet radius, R p 
Planet density, p p 
Surface gravity, logg p 
Equilibrium temp, T p 



2455830.18811 



+ 0.00016 



3.9227186 



+0.0000068 
-0.0000068 



O.H036i °oS 
89.171°;*° 



0.04565 




+ 0.00060 



0.00062 

± 0.044 



0.244 ± 0.020 
776+ - 043 

U - ' '°-0.043 

u.oo_ 09 



3.00 



+0.06 



970 



+32 



days 

days 

deg 

AU 

R Q 

Mj 

Rj 

PJ 

[cgs] 

K 



a Adopted from Hellier et al. (2010) 



4.3 Transmission spectrum 

The transmission spectra produced via the noise models dis- 
cussed in Sect. [3] are shown in Fig. [6] These are prior to 
removal of the common mode systematic. The horizontal 
dashed lines represent the weighted average and plus and 
minus three scale heights, calculated to be ^360 km (one 
scale height of 360 km corresponds to ~ 1.3 x 10 -4 in transit 
depth). In general the transmission spectra are all in broad 
agreement, and are remarkably flat, showing a featureless 
spectrum at the few parts in 10 -4 level. Uncertainties were 
smallest for the simple white noise model (not shown) and 
largest for the full GP marginalisation. However, the disper- 
sion of the points (i.e. the scatter around the average) was 
smallest for the full GP marginalisation. This implies that 
the GP model is doing a particularly good job at determin- 
ing the correct value for p, but for some reason overestimates 
the uncertainty (it is highly unlikely a draw from random 
noise would lead to such small dispersion). In fact, all of the 
noise models presented give reduced y/ 2 significantly smaller 
than 1. We propose that this is due to a common mode sys- 
tematic that the noise models take into account in a similar 
way for each wavelength channel, and led to the correction 
for this as discussed in Sect. 13.41 

Fig. [7] shows the transmission spectra produced after 
the common mode systematic correction. The spectra pro- 
duced by all noise models were again consistent, with the 
full GP model giving uncertainties typically 30 — 40% larger 
than the other models. We therefore choose to adopt these as 
our final uncertainties, for the reasons discussed in Sect. |4.l] 
These results are given in Tab. [3] The transmission spectrum 
is still consistent with a flat model, but we are able to place 
stronger constraints on the atmosphere of WASP-29b using 
the common mode correction. For the common mode cor- 
rected light curves, the white noise values fitted for the GP 
model ranges from 1.51 to 2.12 times the theoretical noise, 
and from 1.11 to 1.50 for the wavelet model. Similarly to 
the white light curve, further sources of noise are accounted 
for by the systematic component of these models. The lower 
ratios between the actual and theoretical white noise for the 
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Table 1. Parameters from the MCMC fits of the white light curves, given for three different noise models: white noise plus (3, wavelets, 
and the fully marginalised Matern 3/2 Gaussian process. 



Parameter 



White Noise +/3 



Noise model 
Wavelet 



GP 



Central transit time, Tq (YUDjjtc) 

Period, P (days) 

System scale, a/R* 

Planet-star radius ratio, R p /R* 

Impact parameter, b 

Linear limb darkening parameter, c\ 

Quad limb darkening parameter, C2 

Out-of-transit flux, / 00 t 

Time gradient, T grac i 

GP max covariance, £ 

GP inverse length scale, r\ 

White noise, dw 

Red noise, a T 



2455853.72515toooo28 
3.922727 (fixed) 

11.95j 

0.0998^ 



0.25 



p;+0.34 
-0.67 
2+0.0026 
-0.0017 
+0.16 
-0.16 
+0.076 



0.721 _ _ 
154+ - 135 

U.104t_o 094 

QQQ6q+ 00009 
-0 00014+ - 00005 

U.UUU14_o .00005 



0.00119 (fixed) 



2455853.72469lg;ogoi5 2455853.72442 



+0.00016 



3.922727 (fixed) 

0984+ - 0010 
u.uyo^_ 0009 



0.14 



+0.10 



+ 0.039 



0.745L w _ 

090+ - 095 
u.uyu_ 051 

QQQ63+ - 00008 
u.yyyuo_ 00008 



-0.00008 



+0.00004 
-0.00003 



U.UUUOZ4t_ ooooi 

n 002^1 + 00033 

u.uuzoi_ 00030 



3.922727 (fixed) 



12.36±g;^ 

0982+ - 0015 
u.uyoz_ 0015 



0.14 
0.698 
0.142 



+0.11 
0.09 
+0.059 
0.089 
+0.142 
0.084 



QQQ 6 5+0-00031 
u.yyyuo_ .00033 

-0.00008 (fixed) 



0.00058 



+ 0.00033 



+ 14.9 



25.7 
0.000426^ 



spectral light curves as compared to the white light curve 
perhaps indicate that atmospheric transmission corrections 
using the comparison star are best done in narrow wave- 
length ranges. 

Fig. [8] shows the transmission spectrum produced with 
the full GP model, now with several model transmission 
spectra of WASP- 29b overplotted. These forward models 
were produced using the nemesis retrieval tool (jlrwin et al 



2008), a radiative transfer code originally developed to inves- 
tigate the atmospheres of Solar system planets, and recently 
adapted for exoplanet transmission spectra ( |Lee et aL] 2012 



Barstow et al. submitted). The grey line shows a model con- 
taining a purely H2 and He atmosphere. The green, red and 
blue lines are with 100 ppmv H2O and 1, 5 and 10 ppmv of 
Na and K added, respectively. Gas absorption line data is 



from [Rothman et al.| fl2010| ) and |Kupka et ah] ( |20QQ| ) for the 
H2O and the alkali metals, respectively. Given the precision 
of the transmission spectrum, we do not attempt a detailed 
retrieval here, rather the models are plotted for reference to 
show the scale of potential features. 

Using our data we can only realistically rule out cloud- 
free atmospheres with significant amounts of Na, given the 
lack of a pressure broadened feature. However, a Na rich 
atmosphere with thick clouds or Rayleigh scattering haze is 
not ruled out. Of course, another explanation is that elemen- 
tal Na is simply not present. This is likely for atmospheres of 
~1000K or cooler (e.g. |Burrows et al.||2000 ), where atomic 
Na can be lost in compounds such as disodium monosulphide 
(Na2S) or ansite (NaAlSisOs), depending on the presence 
of other species in the atmosphere. Similar observations at 
shorter wavelengths could distinguish between a flat feature- 
less spectrum and one dominated by a Rayleigh scattering 
haze, such as the prototypical HD 189733b (Pont et al.|2008| 
Sing et al. 2012), and higher resolution (at similar S/N) 
is required to rule out the presence of a Na core if there 
are clouds or haze present. For comparison the HD 189733b 
transmission spectrum varies by about 2 scale heights over 



Table 3. Transmission spectrum of WASP-29b using the full GP 
marginalisation and after removal of the common mode system- 
atic. 



Wavelength 




P 




Ap 


5211. lA 





0961 





0027 


5348.0A 





0979 





0016 


5485.0A 





0971 





0018 


5621. 9A 





0970 





0016 


5758.8A 





0973 





0013 


5895.8A 





0974 





0013 


6032.7A 





0976 





0013 


6169.6A 





0981 





0013 


6306.6A 





0970 





0014 


6444.9A 





0969 





0014 


6616.0A 





0973 





0016 


6755.7A 





0975 





0032 


6895.4A 





0969 





0019 


7035.0A 





0970 





0032 


7174.7A 





0970 





0018 



this spectral range due to a Rayleigh scattering haze. De- 
spite the lack of constraints we can place on the atmosphere, 
we can rule out WASP-29b having a similar atmosphere to 
the other prototypical hot Jupiter HD 209458b, given the 
lack of a strong pressure broadened Na feature. As stated, 
Na is likely to form compounds below ^1000 K. Given its 
prominence in HD 209458b's transmission spectrum, this 
could indicate another significant transition in classes of hot 
Jupiter atmospheres. 



5 DISCUSSION 

We have presented Gemini GMOS observations of the trans- 
mission spectrum of WASP-29b, using the technique of 
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Figure 8. Transmission spectrum of WASP-29b using the full Matern 3/2 GP noise model, and after removal of the common mode 
systematic. The horizontal dashed lines represent the weighted average and plus and minus three scale heights, calculated to be ~360km 
(one scale height of 360km corresponds to ~ 1 x 10 -4 in transit depth). The grey line shows a model containing a purely H2 and He 
atmosphere. The green, red and blue lines are with 100 ppmv H2O and 1, 5 and 10 ppmv of Na and K added, respectively. These are not 
fitted to the data, but simply over plotted for reference. The models are plotted with a resolution of 50A; the points do not significantly 
change when plotted at the resolution of the spectrum. 



differential spectro-photometry. Using a single comparison 
star, we reached precision on the transit depth of ~ 1 x 10 -4 
showing that GMOS can provide precision spectrophotom- 
etry at the level needed to probe the atmospheres of extra- 
solar planets. 

Using the 'white' light curve, we refined the system 
parameters and ephemeris for WASP-29, finding them to 



be consistent with previous studies (Hellier et al. 2010 



Drag omir et al.||20lT ). Despite picking WASP-29 as a test 



case for GMOS transmission spectroscopy, the precision at- 
tained allows us to rule out a Na rich, cloud/haze free atmo- 
sphere, given the lack of a pressure broadened Na feature. 
This indicates that Na is not present in the atmospheres 
of cooler 'hot' J up iters, or that clouds and/or hazes play 
an important role and mask the pressure broadened alkali 
metal signatures in WASP-29b's upper atmosphere. The for- 
mer explanation is perhaps more likely, although a spectrum 
covering a larger wavelength region is required to confirm 
this, and it is of course possible that both are true. A higher 
resolution spectra (at similar S/N) is required to rule out the 
presence of a narrow Na core if clouds or hazes dominate. 
We note that this represents the first transmission spectrum 
of a hot-Saturn planet. 

We have also presented a detailed analysis and com- 
parison using various types of noise models to account for 
the GMOS systematics. Rather than search for correlations 
with observational parameters such as seeing, airmass, etc., 
we decided to focus on blind methods to account for the sys- 



tematics, i.e. with no additional input parameters used to 
model the systematics. The methods tried include a white 
noise model, a simple rescaling of the photometric uncertain- 



ties, the wavelet method of Carter & Winn (2009), and the 



Gaussian process model of Gibson et al. (2012b) applied to 



time correlated noise. The more sophisticated methods gave 
similar uncertainties, verifying their usefulness for analysis 
of time-correlated systematics. 

In general, we restate some of the conclusions of |Carter| 



& Winn (2009), that any model taking into account time- 



correlated noise is better than ignoring it, and that an anal- 
ysis of the residuals using ACFs and PSDs are especially 
useful in guiding the choice of noise model required. The 
wavelet and GP models give similar results and uncertain- 
ties for the white light curve, and within about 40% for the 
uncertainties in the transmission spectrum. Despite GPs giv- 
ing a slightly more conservative estimate of the uncertainties 
for the GMOS data, the wavelet method is perhaps preferred 
in general for the analysis of time-correlated noise when the 
PSD of the residuals follow a 1/f distribution, given its 
much faster execution time (although we note that this is 
hard to verify for individual datasets, and care must be taken 
for low significance results). The GP method is more gen- 
eral, and can be applied to almost any noise model given 
a suitable choice of kernel (potentially even non- stationary 
noise) , non-regularly spaced data, and can incorporate arbi- 
trary numbers of input vectors into the stochastic function. 
This allows physical systematics models to be folded into 
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Figure 6. Transmission spectra of WASP-29b produced by the 
various noise models prior to removal of the common mode sys- 
tematic. The horizontal dashed lines represent the weighted av- 
erage and plus and minus three scale heights, calculated to be 
~360km (one scale height of 360 km corresponds to ~ 1.3 x 10 — 4 
in transit depth). In all cases the x 2 of a flat model is consider- 
ably lower than 1, indicating that the uncertainty in the relative 
planet-to-star radius ratio might be overestimated due to common 
mode systematics. 



the stochastic part of the GP, therefore allowing principled 
Bayesian inference of the instrument model and negating the 
need to specify the instrument model in closed form ( |Gibson| 
et al. 2012b a). However, this added functionality comes at 
a significant runtime cost, and restricts the use of GPs up to 
datasets of ^1000 points (at least using full marginalisation 
over the hyperparmeters with MCMC methods). Investiga- 
tions into sparse GP models may allow their application 
to larger datasets (e.g. |Quinonero-Candela &; Rasmussen] 
2005). We finally note that given the difficulty in dealing 



with systematic noise, the use of multiple, complimentary 
techniques is desirable where possible, although perhaps the 
only truly robust way to confirm results is to repeat mea- 
surements. 
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APPENDIX A: GP TRIALS ON SIMULATED 
LIGHT CURVES 

In order to choose the best GP kernel for time-correlated 
noise, we ran a series of tests on simulated light curves, with 
injected 'systematic' noise. This appendix briefly describes 
our results. 

In total we simulated 2 400 light curves with 250 data 
points. For each light curve we set the transit parameters 
as follows: P = 4.0 days, a/R* = 12.0, p = 0.1, b = 0.25, 
ci = 0.2, and c<i — 0.2. A white and red noise term were 
then picked from a uniform distribution between 0.0001 to 
0.0006. The injected systematic signal was simulated in a 
variety of ways. First, we created 'function noise', where 
we summed 100 exponential, Gaussian and sinusoidal func- 
tions with random parameters (within sensible limits), and 
rescaled so that the mean and standard deviation were equal 
to unity and the red noise term, respectively. Second, we cre- 
ated 1/f noise in a similar way to Carter &; Winn (2009). 
We created a signal in the Fourier domain corresponding 
to a PSD of 1//, by setting a random amplitude within a 
1/f ' 5 envelope (the PSD is the square of the Fourier trans- 
form magintude), with a corresponding random phase. The 
inverse Fourier transform then produced the systematic sig- 
nal. The signal was then scaled to have mean and standard 
deviation in the same way as before. Finally, we created 
some signals using a combination of these methods, by sim- 
ply adding the two signals each with individual red noise 
terms chosen (and rescaling to a mean of 1). The model 
light curve was multiplied by the systematic signal, and the 
white noise was finally added. Out of the 2 400 light curves, 
800 were created using 'function noise', 800 with 1/f noise, 
and 800 with the combined noise model. The light curves 
were inspected to ensure they appeared realistic. 

Each light curve was fitted using a white noise model, 
as described in Se ct. |3.1| and the Gaussian process model 
described in Sect. |3.2[ this time using a range of kernels, 
including the squared exponential (SE), rational quadratic 
(RQ) and Matern 3/2 (MAT). The MAT kernel is already 
defined in Eq.[3] The SE kernel is: 



k(t n ,t m \0) = e 



; exp (-r]At 2 ) + dnmcr'i. 



This kernel has the same parameters as the MAT kernel, 
only the shape changes. The MAT kernel is more sharply 
peaked at At = 0, resulting in a rougher function, whereas 
the SE is an infinitely differentiable, smooth function of the 
input. The RQ kernel is: 



k{t n ,tm\0) =f 1 + 



At 2 
2af 



where a is a shape parameter and I is a length scale parame- 
ter. This kernel is a smooth function, and is a scale mixture 
of SE kernels with different characteristic length scales. The 
limit as a — » oo is the SE kernel. For a detailed discussion 
of GP kernels see iRasmussen & Williamsl (2006). 

We ran a single MCMC chain of length 5000 for each 
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fit, and discarded the first 1000 points. We checked for con- 
vergence visually for a subset of the light curves. We fitted 
for Tc, p, a/R* and / 00 t, and the kernel hyperparameters. 6, 
ci, C2 and T gra d were held fixed. This was a compromise to 
maintain degeneracies in the fit, but also to allow for shorter 
MCMC chains and therefore substantial numbers of trial 
light curves. We followed the approach of |Carter &; W inn 
(2009) to analyse the results. We calculated the 'number-of- 
sigma' statistic for each parameter, 

M = (p-p)/a p , 

where p is the parameter estimate from the MCMC chain, 
a p is the uncertainty, and p is the true parameter value. This 
statistic should be distributed with a mean and variance of 
and 1, respectively, if the parameter uncertainties from the 
model fits are Gaussian with the derived uncertainty. The 
results are plotted in Fig. |Al| for the white noise model and 
the three GP models. The standard deviation for the fitted 
parameters are given in each plot, along with their average 
distance from 1.0 (calculated as the standard deviation with 
fixed mean of 1.0). 

These results show that the MAT kernel outperforms 
the others for analysis of time-correlated noise, therefore we 
selected it for the GMOS analysis. However, we note that 
this is only valid for the specific noise models we tried, and 
varies significantly with varying noise parameters. We also 
note that these results are likely to depend in a complex 
way on the parameters of the light curve and the number of 
data points, as well as the injected noise properties. These 
tests were designed as a simple way of choosing the best ker- 
nel for time-correlated noise like we see in the GMOS light 
curves, and are not intended to be complete. Indeed, the 
right kernel to use is probably best selected on a case-by- 
case basis. Perhaps most importantly, we note that all three 
kernels invariably gave results significantly better than the 
simple white noise model. This demonstrates that any rea- 
sonably chosen kernel performs better than a simple white 
noise analysis when time-correlated noise is present. 
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Figure Al. Histograms of the 'number-of-sigma' statistic for each of the noise models used. The plots are colour coded and show the 
distributions for the central transit time, system scale, and the planet-to-star radius ratio. The standard deviation of each is given, along 
with the average (see text). The dashed line shows a Gaussian with a mean of zero and standard deviation of 1, i.e. what an ideal noise 
model would produce. 
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